Personal Knowledge Base
Why most system outage or problems happens at 2am-3am time?
- Maintenance & deployments are scheduled at night
Meaning: Most systems are intentionally changed at night to avoid disturbing users. Changes are when things break.
Why 2–3 AM specifically
- Lowest traffic window (India: ~2–6 AM IST; US/EU overlap also favors this time)
- Businesses define this as a maintenance window (pre-approved “safe” time to touch prod)
What happens
- Code deploys
- Database migrations
- Config changes
- Infra upgrades (kernel, Kubernetes nodes, load balancers)
Reality: Most outages are caused by change, not load.
This is a known SRE (Site Reliability Engineering) principle.
- Humans are tired → mistakes spike 😴
Meaning: 2–3 AM is peak human cognitive low point.
Scientifically:
- Reaction time slows
- Attention drops
- Risky decisions increase
In India especially:
- On-call engineers often work day + night on-call
- Sleep debt accumulates
- “Just push it, it’ll be fine” mindset
Examples
- Running a migration on the wrong DB
- Forgetting WHERE in a SQL update
- Restarting the wrong service/cluster
- Copy-pasting prod credentials into staging scripts
- Low traffic hides problems until suddenly it doesn’t 📉→📈
Meaning: Problems exist earlier, but they become visible only when traffic pattern shifts.
At night:
- Background jobs (ETL, cron, backups) run heavily
- Queues fill silently
- Latency increases slowly
Then:
- Morning traffic hits
- System collapses
- People think “it failed at 3 AM”, but root cause started earlier
- Infra & cloud scheduled tasks run at night ☁️
Cloud providers (AWS, GCP, Azure) often do:
- Hardware maintenance
- Network rebalancing
- Spot instance reclamation
- Auto-scaling recalculations